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Standard multivariate analysis methods aim to identify and sum- 
marize the main structures in large data sets containing the descrip- 
tion of a number of observations by several variables. In many cases, 
spatial information is also available for each observation, so that 
a map can be associated to the multivariate data set. Two main 
objectives are relevant in the analysis of spatial multivariate data: 
summarizing covariation structures and identifying spatial patterns. 
In practice, achieving both goals simultaneously is a statistical chal- 
lenge, and a range of methods have been developed that offer trade- 
offs between these two objectives. In an applied context, this method- 
ological question has been and remains a major issue in community 
ecology, where species assemblages (i.e., covariation between species 
abundances) are often driven by spatial processes (and thus exhibit 
spatial patterns). 

In this paper we review a variety of methods developed in commu- 
nity ecology to investigate multivariate spatial patterns. We present 
different ways of incorporating spatial constraints in multivariate 
analysis and illustrate these different approaches using the famous 
data set on moral statistics in France published by Andre-Michel 
Guerry in 1833. We discuss and compare the properties of these dif- 
ferent approaches both from a practical and theoretical viewpoint. 

1. Introduction. A recent study [Friendly (2007)] revived Andre-Michel 
Guerry's (1833) Essai sur la Statistique Morale de la France. Guerry gath- 
ered data on crimes, suicide, literacy and other "moral statistics" for various 
departements (i.e., counties) in France. He provided the first real social data 
analysis, using graphics and maps to summarize this georeferenced multi- 
variate data set. The work of Friendly (2007) contained a historical part 
describing Guerry's life and work in detail. In a second part. Friendly re- 
analyzed Guerry's data using a variety of modern tools of multivariate and 
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spatial analysis. He considered two main approaches to analyzing a data set 
involving both multivariate and geographical aspects: data-centric (multi- 
variate analysis) and map-centric (multivariate mapping) displays. In the 
first approach, the multivariate structure is first summarized using standard 
analysis methods [e.g., principal component analysis, Hotelling (1933)] and 
visualization methods [e.g., biplot, Gabriel (1971)]. The geographic infor- 
mation is only added a posteriori to the graphs, using colors or other visual 
attributes. This approach thus favors the display of multivariate structures 
over spatial patterns. On the other hand, multivariate mapping (i.e., the rep- 
resentation of several variables on a single map using multivariate graphs) 
emphasizes the geographical context but fails to provide a relevant summary 
of the covariations between the variables. Moreover, multivariate mapping 
raises several technical issues such as the lack of readability of multivariate 
symbols (e.g., Chernoff faces), which can only be used to represent a few 
variables and are sometimes difficult for nonspecialists to interpret. Friendly 
(2007) stated that Guerry's questions, methods and data still present chal- 
lenges for multivariate and spatial visualization today. While he acknowl- 
edged progress in both exploratory spatial data analysis and multivariate 
methods, he also suggested that the integration of these data-centric and 
map- centric visualization and analysis is still incomplete. He concluded his 
paper with a motivating question: Who will rise to Guerry's challenge?. 

This challenge has been one of the major methodological concerns in com- 
munity ecology (and in other disciplines, e.g., public health) over the last few 
decades. Community ecology is a subdiscipline of ecology that aims to un- 
derstand the organization and causes of species associations. As community 
data are essentially multivariate (many species, many sites, many environ- 
mental factors and complex spatio-temporal sampling designs), questions 
about the structure and drivers of ecological communities have tradition- 
ally been addressed through multivariate analyses [Legendre and Legendre 
(1998)]. Hence, it has been and remains a very fertile field for the develop- 
ment and the application of multivariate techniques. One of the most active 
research goals in ecology today is to understand the relative importance 
of processes that determine the spatial organization of biodiversity at mul- 
tiple scales [Legendre (1993)]. As a consequence, the last decade has seen 
efforts in the methodological domain to render the multivariate analysis of 
community data more spatially explicit or, conversely, to generalize analy- 
ses of spatial distributions to handle the covariation of many species. These 
methods allow us to identify the main spatial patterns by considering simul- 
taneously both multivariate and geographical aspects of the data. They thus 
represent a first step toward the integration of data-centric and map-centric 
visualizations into a single method. 

In this paper we take up Friendly's challenge by demonstrating how sev- 
eral spatially-explicit multivariate methods developed initially in the context 
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Table 1 

Variable names, labels and descriptions. Note that four variables have been 
recorded in the form of "Population per. .." so that low values correspond 
to high rates, whereas high values correspond to low rates. Hence, for all of 
the variables, more (larger numbers) is "morally" better 



Label 


Description 


Crime_pers 


Population per crime against persons 


Crime_prop 


Population per crime against property 


Literacy 


Percent of military conscripts who can read and write 


Donations 


Donations to the poor 


Infants 


Population per illegitimate birth 


Suicides 


Population per suicide 



of community ecology could also be of benefit to other fields. We present dif- 
ferent ways of incorporating the spatial information into multivariate anal- 
ysis, using the duality diagram framework [Escoufier (1987)] to describe 
the mathematical properties of these methods. We illustrate these different 
methodological alternatives by reanalyzing Guerry's data. 

2. Standard approaches. We use the data set compiled by Michael Friend- 
ly and available at http://www.math.yorku.ca/SCS/Gallery/guerry/. 

This data set has been recently analyzed by Dykes and Brunsdon (2007) 
to illustrate a new interactive visualization tool and is now distributed in 
the form of an R package [see Dray and Jombart (2010) for details]. We con- 
sider six key quantitative variables (Table 1) for each of the 85 departements 
of France in 1830 (Corsica, an island and often an outlier, was excluded). In 
this section we focus on classical approaches that consider either the multi- 
variate or the spatial aspect of the data. In the next sections we will present 
methods that consider both aspects simultaneously. 

2.1. Multivariate analysis. Multivariate analysis allows us to identify 
and summarize the primary underlying structures in large data sets by re- 
moving any redundancy in the data. It aims to construct a low-dimensional 
space (e.g., 2 or 3 dimensions) that retains most of the original variability 
of the data. The classical output consists of graphical summaries of obser- 
vations and variables that are interpreted for the first few dimensions. 

2.1.1. The duality diagram theory. Multivariate data are usually recorded 
in a matrix X with n rows (observations) and p columns (variables). The 
duality diagram is a mathematical framework that defines a multivariate 
analysis setup using a set of three matrices. We can consider the (possi- 
bly transformed) data matrix X (n x p) as a part of a statistical triplet 
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(X, Q,D), where Q {p >^ p) and D (n x n) are usually symmetric positive 
definite matrices used as metrics [i.e., Q provides a metric for the variables 
(columns of X) and D provides a metric for the observations (rows of X)]. 
This unifying mathematical framework encompasses very general properties, 
which will be described, to the analysis of a triplet. For more details, the 
reader should consult Escoufier (1987), Holmes (2006) or Dray and Dufour 
(2007). The mathematical properties of each particular method (correspond- 
ing to a particular choice of matrices X, Q and D) can then be derived from 
the general properties of the diagram. Note that the analysis of the duality 
diagram associated to the triplet (X, Q,D) is equivalent to the generalized 
singular value decomposition [GSVD, e.g., Greenacre (1984), Appendix A] 
of X with the metrics Q and D. 

The analysis of the diagram consists of the eigen-decomposition of the 
operators XQX^D or X^DXQ. These two eigen-decompositions are related 
to each other (dual) and have the same eigenvalues. Thus, we have 

XQX^DK = KA,.], 
X'^DXQA = AA[^]. 

r is called the rank of the diagram, and the nonzero eigenvalues Ai > A2 > 
• • • > Ar > are stored in the diagonal matrix Aj,,] . 

K = [k^ , . . . , k''] is a n X r matrix containing the r nonzero associated 
eigenvectors (in columns). These vectors are D-orthonormalized (i.e., 
K'^DK = I^) and are usually called the principal components. 

A = [a^ , . . . , a''] is a p X r matrix containing the r nonzero eigenvectors 
(in columns). These vectors are Q-orthonormalized (i.e., A'^QA = Ir) and 
are usually called the principal axes. 

The row scores R = XQA are obtained by projection of the observations 
(rows of X) onto the principal axes. The vectors a^, a^, . . . , a*" successively 
maximize, under the Q-orthogonality constraint, the following quadratic 
form: 

(1) Q(a) = a^Q^X^DXQa. 

If D defines a scalar product, then we have (5(a) = ||XQa|||j. 

The column scores C = X^DK are obtained by projection of the variables 
(columns of X) onto the principal components. The vectors k^,k^,...,k^ 
successively maximize, under the D-orthogonality constraint, the following 
quadratic form: 

(2) 5(k) = k^D^^XQX^Dk. 

If Q defines a scalar product, then we have ^(k) = HX'^'DkHQ. Usually, the 
outputs (column and row scores) are only interpreted for the first few axes 
(dimensions) . 
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Fig. 1. Principal component analysis of Guerry's data. (A) Barplot of eigenvalues. (B) 
Correlation between variables and principal components. (C) Projections of departements 
on principal axes. The color of each square corresponds to a region of France. 



2.1.2. Application to Guerry's data. Here we consider p = 6 variables 
measured for n = 85 observations (departements of France). As only quan- 
titative variables have been recorded, principal component analysis [PCA, 
Hotelling (1933)] is well adapted. Applying PCA to the correlation ma- 
trix where Q = Ip, D = -I„ and X contains ^-scores, we obtain Q{sl) = 
||XQa||2j = var(XQa) and''5(k) = ||XTDk||2j = ^^^^ cor2(k, x^) from equa- 
tions (1) and (2). Hence, this PCA summarizes the data by maximizing 
simultaneously the variance of the projection of the observations onto the 
principal axes and the sum of the squared correlations between the principal 
component and the variables. 

For didactic purposes, following Friendly (2007), we interpret two dimen- 
sions, while the barplot of eigenvalues (Figure lA) would rather suggest 
a 1-D or a 3-D solution. The first two PCA dimensions account for 35.7% 
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and 20%, respectively, of the total variance. The correlations between vari- 
ables and principal components are represented on the correlation circle in 
Figure IB. As we have excluded Corsica (an outlier) in the present paper, the 
results are slightly different from those reported in Friendly (2007). The first 
axis is negatively correlated to literacy and positively correlated to property 
crime, suicides and illegitimate births. The second axis is aligned mainly 
with personal crime and donations to the poor. As we are also interested 
in spatial patterns, we have added geographical information in the form of 
color symbols on the factorial map of departements (Figure IC). Each color 
corresponds to one of five regions of France. The results are quite difficult 
to interpret, but some general patterns can be reported. For the first axis, 
the North and East are characterized by negative scores, corresponding to 
high levels of literacy and high numbers of suicides, crimes against property 
and illegitimate births. The second axis mainly contrasts the West (high 
donations to the the poor and low levels of crime against persons) to the 
South. 

2.2. Spatial autocorrelation. Exploratory spatial data analysis (ESDA) 
is a subset of exploratory data analysis [EDA, Tukey (1977)] that focuses on 
detecting spatial patterns in data [Haining (1990)]. In this context, spatial 
autocorrelation statistics, such as Moran (1948)'s Coefficient (MC) and the 
Geary (1954) Ratio, aim to measure and analyze the degree of dependency 
among observations in a geographical context [Cliff and Ord (1973)]. 

2.2.1. The spatial weighting matrix. The first step of spatial autocor- 
relation analysis is to define a n x n spatial weighting matrix, usually de- 
noted W. This matrix is a mathematical representation of the geographi- 
cal layout of the region under study [Bivand (2008)]. The spatial weights 
reflect a priori the absence (wij = 0), presence or intensity {wij > 0) of the 
spatial relationships between the locations concerned. Spatial weighting ma- 
trices can be usefully represented as graphs (neighborhood graphs), where 
nodes correspond to spatial units (departements) and edges to nonnull spa- 
tial weights. 

The simplest neighborhood specification is a connectivity matrix C, in 
which Cij = 1 if spatial units i and j are neighbors and Cij = otherwise. 
More sophisticated definitions [Getis and Aldstadt (2004); Dray, Legendre 
and Peres-Neto (2006)] are able to take into account the distances between 
the spatial units or the length of the common boundary between the regions 
for areal data. In the case of Guerry's data, we simply defined a binary 
neighborhood where two departements i and j are considered as neighbors 
(cjj = 1) if they share a common border (Figure 2). 

The connectivity matrix C is usually scaled to obtain a spatial weighting 
matrix W, most often with zero diagonal. The row-sum standardization 
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(elements sum to 1 in each row) is generahy preferred; it is obtained by 

Cij 

Alternative standardizations are discussed in Tiefelsdorf, Griffith and 
Boots (1999). 

2.2.2. Moran's coefficient. Once the spatial weights have been defined, 
the spatial autocorrelation statistics can then be computed. Let us consider 
the n-by-1 vector x = [xi • • • Xnf^ containing measurements of a quantitative 
variable for n spatial units. The usual formulation for Moran's coefficient of 
spatial autocorrelation [Cliff and Ord (1973); Moran (1948)] is 

MC(x) = y^n (^._^^2 ^^e^e 2^ = 2^1^ ^it^ ' + ^■ 

1^{2)^V l^i=iyX^ ^) (2) i=lj=l 

(3) 

MC can be rewritten using matrix notation: 
(4) ^C^^)=1TW1W 

where z = (I„ — l„l^/n)x is the vector of centered values (zj = Xi — x) 
and In is a vector of ones (of length n). 

The numerator of MC corresponds to the covariation between contiguous 
observations. This covariation is standardized by the denominator, which 
measures the variance among the observations. The significance of the ob- 
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Table 2 

Values of Moran's coefficient for the six 
variables. P-values obtained by a randomization 
testing procedure ( 999 permutations ) are given m 
parentheses 



MC 



Crime_pers 


0.411 (0.001) 


Crime_prop 


0.264 (0.001) 


Literacy 


0.718 (0.001) 


Donations 


0.353 (0.001) 


Infants 


0.229 (0.001) 


Suicides 


0.402 (0.001) 



served value of MC can be tested by a Monte Carlo procedure, in which 
locations are permuted to obtain a distribution of MC under the null hy- 
pothesis of random distribution. An observed value of MC that is greater 
than that expected at random indicates the clustering of similar values across 
space (positive spatial autocorrelation), while a significant negative value of 
MC indicates that neighboring values are more dissimilar than expected by 
chance (negative spatial autocorrelation). 

We computed MC for Guerry's data set using the row-standardized defini- 
tion of the spatial weighting matrix associated with the neighborhood graph 
presented in Figure 2. A positive and significant autocorrelation is identified 
for each of the six variables (Table 2). Thus, the values of literacy are the 
most covariant in adjacent departments, while illegitimate births (Infants) 
covary least. 

2.2.3. Moran scatterplot. If the spatial weighting matrix is row-standar- 
dized, we can define the lag vector z = Wz (i.e., Zi = X^j=i WijXj) composed 
of the weighted (by the spatial weighting matrix) averages of the neighboring 
values. Equation (4) can then be rewritten as 

(5) MC(x) = ^, 

z^ z 

since in this case l"''Wl = n. Equation (5) shows clearly that MC measures 
the autocorrelation by giving an indication of the intensity of the linear as- 
sociation between the vector of observed values z and the vector of weighted 
averages of neighboring values z. Anselin (1996) proposed to visualize MC 
in the form of a bivariate scatterplot of z against z. A linear regression can 
be added to this Moran scatterplot, with slope equal to MC. The Moran 
scatterplot is a very nice graphical tool to evaluate and represent the de- 
gree of spatial autocorrelation, the presence of outliers or local pockets of 
nonstationarity [Anselin (1995)]. 
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Fig. 3. Moran scatterplot for Literacy. Dotted lines corresponds to means. 



Considering the Literacy variable of Guerry's data, the Moran scatter- 
plot (Figure 3) clearly shows strong autocorrelation. It also shows that the 
Hautes-Alpes departement has a slightly outlying position characterized by 
a high value of Literacy compared to its neighbors. This departement can 
be considered as a leverage point that drags down the assessment of the link 
between Literacy and spatial-lagged literacy (i.e., MC). This is confirmed 
by different diagnostic tools [DFFITS, Cook's D, e.g., Chatterjee and Hadi 
(1986)] adapted to the linear model. 

2.3. Toward an integration of multivariate and geographical aspects. The 
integration of multivariate and spatial information has a long history in ecol- 
ogy. The simplest approach considered a two-step procedure where the data 
are first summarized with multivariate analysis such as PCA. In a second 
step, univariate spatial statistics or mapping techniques are applied to PCA 
scores for each axis separately. Goodall (1954) was the first to apply multi- 
variate analysis in ecology, and he integrated spatial information a posteriori 
by mapping PCA scores onto the geographical space using contour lines. One 
can also test for the presence of spatial autocorrelation for the first few scores 
of the analysis, with univariate autocorrelation statistics such as MC. For 
instance, we mapped scores of the departements for the first two axes of the 
PCA of Guerry's data (Figure 4). Even if PCA maximizes only the variance 
of these scores, there is also a clear spatial structure, as the scores are highly 
autocorrelated. The map for the first axis corresponds closely to the split 
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Fig. 4. Principal component analysis of Guerry's data. Map of departements' scores for 
the first (left) and second (right) PGA axes. Values of Moran's coefficient and associated 
P-values obtained by a randomization testing procedure (999 permutations) are given. 

between la France eclairee (North-East characterized by an higher level of 
Literacy) and la France obscure. 

It is very simple to carry out this two-step approach but it has the major 
disadvantage of being indirect, as it considers the spatial pattern only af- 
ter summarizing the main structures of the multivariate data set. Anselin, 
Syabri and Smirnov (2002) proposed a more direct approach by extending 
the Moran scatterplot to the bivariate case. If we consider two centered vari- 
ables zi and Z2, the bivariate Moran scatterplot represents Z2 = Wz2 on the 
vertical axis and zi on the horizontal axis. In a case with more than two 
variables, one can produce bivariate Moran scatterplots for all combinations 
of pairs of variables. However, this approach becomes difficult to use when 
the number of variables increases. In the next section we present several ap- 
proaches that go one step further by considering the identification of spatial 
structures and the dimensionality reduction simultaneously. 

3. Spatial multivariate analysis. Over the last two decades, several ap- 
proaches have been developed to consider both geographical and multivari- 
ate information simultaneously. The multivariate aspect is usually treated by 
techniques of dimensionality reduction similar to PCA. On the other hand, 
several alternatives have been proposed to integrate the spatial information. 
We review various alternatives in the following sections. 

3.1. Spatial partition. One alternative is to consider a spatial partition 
of the study area. In this case, the spatial information is coded as a catego- 
rical variable, and each category corresponds to a region of the whole study 
area. This partitioning can be inherent to the data set (e.g., administrative 
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units) or can be constructed using geographic information systems [e.g., 
grids of varying cell size in Dray, Pettorelli and Chessel (2003)]. For instance, 
Guerry's data contained a partition of France into 5 regions (Figure 1). 

In this context, searching for multivariate spatial structures would lead us 
to look for a low-dimensional view that maximizes the difference between the 
regions. To this end. Friendly (2007) used discriminant analysis, a widely- 
used method providing linear combinations of variables that maximize the 
separation between groups as measured by an univariate F statistic. How- 
ever, this method suffers from some limitations: it requires the number of 
variables to be smaller than the number of observations, and it is impaired by 
collinearity among variables. Here we used an alternative and lesser known 
approach, the between-class analysis [BCA, Doledec and Chessel (1987)], to 
investigate differences between regions. Unlike discriminant analysis, BCA 
maximizes the variance between groups (without accounting for the variance 
within groups) and is not subject to the restrictions applying to the former 
method. 

BCA associates a triplet (X, Q,D) to a, n x g matrix Y of dummy vari- 
ables indicating group membership. Let A be the g x p matrix of group 
means for the p variables and Dy be the g x g diagonal matrix of group 
weights derived from the matrix D of observation weights. By definition, 
we have A = (YTdY)-IyTdX and Dy = (Y^DY). BCA corresponds to 
the analysis of (A, Q,Dy) and diagonalizes the between-groups covariance 
matrix A'^DyAQ. 

Here, 28.8% of the total variance (sum of eigenvalues of PCA) corre- 
sponds to the between-regions variance (sum of the eigenvalues of BCA). 
The barplot of eigenvalues indicates that two axes should be interpreted 
(Figure 5A). The first two BCA dimensions account for 59% and 30.2%, 
respectively, of the between-regions variance. The coefficients used to con- 
struct the linear combinations of variables are represented on Figure 5B. 
The first axis opposed literacy to property crime, suicides and illegitimate 
births. The second axis is mainly aligned with personal crime and donations 
to the poor. The factorial map of departements (Figure 5C) and the maps 
of the scores (Figure 5D, E) show the spatial aspects. The results are very 
close to those obtained by PCA: the first axis contrasted the North and the 
East {la France eclairee) to the other regions, while the South is separated 
from the other regions by the second axis. The high variability of the region 
Center is also noticeable. In contrast, the South is very homogeneous. 

3.2. Spatial explanatory variables. Principal component analysis with re- 
spect to the instrumental variables [PCAIV, Rao (1964)], also known as re- 
dundancy analysis [van den Wollenberg (1977)], is a direct extension of PCA 
and multiple regression adapted to the case of multivariate response data. 
The analysis associates a, n x q matrix Z of explanatory variables to the 
triplet (X, Q,D) where the matrix X contains the response variables. The 
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Fig. 5. Between-class analysis of Guerry's data. (A) Barplot of eigenvalues. (B) Coeffi- 
cients of variables. (C) Projections of departements on the EC A axes. Map of departements 
scores for the first (D) and second (E) axes. The different colors correspond to regions of 
France. 



D-orthogonal projector Pz = Z(Z"'"DZ)~^Z"'"D is first used in a multivari- 
ate regression step to compute a matrix of predicted values X = PzX. The 
second step of PCAIV consists of the PCA of this matrix of predicted val- 
ues and corresponds then to the analysis of the triplet (X,Q,D). Whereas 
PCA maximizes the variance of the projection of the observations onto the 
principal axes, PCAIV maximizes the variance explained by Z. 

PCAIV and related methods, such as canonical correspondence analysis 
[ter Braak (1986)], have been often used in community ecology to identify 
spatial relationships. The spatial information is introduced in the matrix Z 
under the form of spatial predictors and the analysis maximized then the 
"spatial variance" (i.e., the variance explained by spatial predictors). Note 
that BCA can also be considered as a particular case of PCAIV, where the 
explanatory variables are dummy variables indicating group membership. 

3.2.1. Trend surface of geographic coordinates. From the EDA point of 
view, the data exploration has been conceptualized by Tukey (1977) in 
the quasi-mathematical form DATA = SMOOTH -|- ROUGH. Trend sur- 
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Fig. 6. Maps of the terms of a second-degree orthogonal polynomial. Centroids of 
departements have been used as original coordinates to construct the polynomial. 

face analysis is the oldest procedure for separating large-scale structure 
{SMOOTH) from random variation (ROUGH). Student (1914) proposed 
expressing observed values in time series as a polynomial function of time, 
and mentioned that this could be done for spatial data as well. Borcard, 
Legendre and Drapeau (1992) extended this approach to the spatial and 
multivariate case by introducing polynomial functions of geographic coor- 
dinates as predictors in PCAIV. We call this approach PCAIV-POLY in 
the rest of the paper. Usually, polynomials of degree 2 or 3 are used; spu- 
rious correlations between these spatial predictors can be removed using an 
orthogonalization procedure to obtain orthogonal polynomials. 

The centroids of departements of France were used to construct a second- 
degree orthogonal polynomial (Figure 6). 

Here, 32.4% of the total variance (sum of eigenvalues of PCA) is explained 
by the second-degree polynomial (sum of eigenvalues of PCAIV). The first 
two dimensions account for 51.4% and 35.2%, respectively, of the explained 
variance. The outputs of PCAIV-POLY (coefficients of variables, maps of 
departements scores, etc.) are not presented, as they are very similar to 
those obtained by BCA. 

3.2.2. Moran's eigenvector maps. An alternative way to build spatial 
predictors is by the diagonalization of the spatial weighting matrix W. 
de Jong, Sprenger and van Veen (1984) have shown that the upper and 
lower bounds of MC for a given spatial weighting matrix W are equal to 
Amax("'/l^Wl) and Amm("'/l'^Wl), where Amax and Amm are the extreme 
eigenvalues of = HWH where H = (I — ll"^ /n) is a centering operator. If 
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Fig. 7. Maps of the first ten MEM of the spatial weighting matrix associated to the 
neighborhood graph presented on Figure 2. By definition, MEM are orthogonal vectors 
maximizing the values of Moran's coefficient. 



a nonsymmetric spatial weighting matrix W* has been defined, the results 
can be generalized using W = (W* + W*'^)/2. 

Moran's eigenvector maps [MEM, Dray, Legendre and Peres-Neto (2006)] 
are the n — 1 eigenvectors of ft. They are orthogonal vectors with a unit 
norm maximizing MC [Griffith (1996)]. MEM associated with high positive 
(or negative) eigenvalues have high positive (or negative) autocorrelation. 
MEM associated with eigenvalues with small absolute values correspond to 
low spatial autocorrelation, and are not suitable for defining spatial struc- 
tures [Dray, Legendre and Peres-Neto (2006)]. Unlike polynomial functions, 
MEM have the ability to capture various spatial structures at multiple scales 
(coarse to fine scales). MEM have been used for spatial filtering purposes 
[Griffith (2003); Getis and Griffith (2002)] and introduced as spatial pre- 
dictors in linear models [Griffith (1996, 2000)], generalized linear models 
[Griffith (2002, 2004)] and multivariate analysis [Dray, Legendre and Peres- 
Neto (2006); Jombart, Dray and Dufour (2009)]. 

We used the spatial weighting matrix associated to the neighborhood 
graph presented on Figure 2 to construct MEM. The first ten MEM, corre- 
sponding to the highest levels of spatial autocorrelation, have been mapped 
in Figure 7 and introduced as spatial explanatory variables in PCAIV. We 
call this approach PCAIV-MEM in the rest of the paper. 44.1% of the total 
variance (sum of eigenvalues of PGA) is explained by the first ten MEM (sum 
of eigenvalues of PCAIV). The first two dimensions account for 54.9% and 
26.3%, respectively, of the explained variance. The outputs of PCAIV-MEM 
(coefficients of variables, maps of departement scores, etc.) are not presented, 
as they are very similar to those obtained by the previous analyses. 

3.3. Spatial graph and weighting matrix. The MEM framework intro- 
duced the spatial information into multivariate analysis through the eigen- 
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decomposition of the spatial weighting matrix. Usuahy, we consider only 
a part of the information contained in this matrix because only a subset of 
MEM are used as regressors in PCAIV. In this section we focus on multi- 
variate methods that consider the spatial weighting matrix under its original 
form. 

Lebart (1969) was the first to introduce a neighborhood graph into a mul- 
tivariate analysis. Following this initial work, many methods have been 
mainly developed by the French school of statisticians [Le Foil (1982); Be- 
nali and Escofier (1990); Meot, Chessel and Sabatier (1993)]. These contri- 
butions were important from a methodological point of view, but have been 
rarely used for applied problems. Indeed, they have a major drawback in 
their objectives: they maximize the local variance (i.e., the difference be- 
tween neighbors), while users more often want to minimize this quantity 
and maximize the spatial correlation (i.e., the SMOOTH). 

Wartenberg (1985) was the first to develop a multivariate analysis based 
on MC. His work considered only normed and centered variables (i.e., normed 
PC A) for the multivariate part and a binary symmetric connectivity matrix 
for the spatial aspect. Dray, Said and Debias (2008) generalized Warten- 
berg's method by introducing a row-standardized spatial weighting matrix 
in the analysis of a statistical triplet (X, Q,D). Hence, this approach is 
very general and allows us to define spatially-constrained versions of var- 
ious methods (corresponding to different triplets) such as correspondence 
analysis or multiple correspondence analysis. 

By extension of the lag vector, a lag matrix X = WX can be defined. 
The two tables X and X are fully matched, that is, they have the same 
columns (variables) and rows (observations). MULTISPATI (Multivariate 
spatial analysis based on Moran's index) aims to identify multivariate spa- 
tial structures by studying the link between X and X using the coinertia 
analysis [Doledec and Chessel (1994); Dray, Chessel and Thioulouse (2003a)] 
of a pair of fully matched tables [Torre and Chessel (1995); Dray, Chessel and 
Thioulouse (2003b)]. It corresponds to the analysis of the statistical triplet 
(X, Q, i(W^D + DW)). The objective of the analysis is to find a vector a 
(with llallq) maximizing the quantity defined in equation (1): 

Q(a) = a^Q^X^KW^D^ + DW)XQa 

= i(a^Q^X^W^D^XQa + a^Q^X^DWXQa) 

(6) 

= i(XQa, WXQa)D + (WXQa,XQa)D 

= a'^Q^X^DWXQa = r^DWr = r^Df . 

This analysis maximizes the scalar product between a linear combination 
of original variables (r = XQa) and a linear combination of lagged variables 
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(r = WXQa). Equation (6) can be rewritten as 



Q(a) 



a^Q^X^DWXQa 



a^Q^X^DXQa 



(7) 



:|2 
iId- 



aTQTxTDXQa 

MCD(XQa) • ||XQa||f) = MCD(r) 

MULTISPATI finds coefficients (a) to obtain a linear combination of vari- 
ables (r = XQa) that maximizes a compromise between the classical mul- 
tivariate analysis (||r||Q) and a generalized version of Moran's coefficient 
[MCD(r)]- The only difference between the classical Moran's coefficient [equa- 
tion (4)] and its generalized version MCd is that the second one used a gen- 
eral matrix of weights D, while the first considers only the usual case of 
uniform weights (D = ^In)- 

In practice, the maximum of equation (7) is obtained for a = a^, where a^ 
is the first eigenvector of the Q-symmetric matrix ^X'^(W'^D + DW)Q. 
This maximal value is equal to the associated eigenvalue Ai. Further eigen- 
vectors maximize the same quantity with the additional constraint of or- 
thogonality. 

MULTISPATI has been applied to Guerry's data (Figure 8). The barplot 
of eigenvalues (Figure 8 A) suggests two main spatial structures. The coeffi- 
cients used to construct the linear combinations of variables are represented 
in Figure 8B. The first axis opposes literacy to property crime, suicides and 
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Fig. 8. MULTISPATI of Guerry's data. (A) Barplot of eigenvalues. (B) Coefficients of 
variables. Mapping of scores of plots on the first (C) and second (E) axis and of lagged 
scores (averages of neighbors weighted by the spatial connection matrix) for the first (D) 
and second (F) axis. Representation of scores and lagged scores (G) of plots (for each 
departement, the arrow links the score to the lagged score). Only the departements discussed 
in the text are indicated by their labels. 
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illegitimate births. The second axis is aligned mainly with personal crime 
and donations to the poor. The maps of the scores (Figure 8C, E) show that 
the spatial structures are very close to those identified by PCA. The sim- 
ilarity of results between PCA and its spatially optimized version confirm 
that the main structures of Guerry's data are spatial. 

MULTISPATI maximizes the product between the variance and the spa- 
tial autocorrelation of the scores, while PCA (Figure 1) maximizes only the 
variance. Hence, there is a loss of variance compared to PCA (2.14 versus 
2.017 for axis 1; 1.201 versus 1.177 for axis 2) but a gain of spatial autocor- 
relation (0.551 versus 0.637 for axis 1; 0.561 versus 0.59 for axis 2). 

Spatial autocorrelation can be seen as the link between one variable and 
the lagged vector [equation (5)]. This interpretation is used to construct the 
Moran scatterplot and can be extended to the multivariate case in MULTI- 
SPATI by analyzing the link between scores (Figure 8C, E) and lagged scores 
(Figure 8D, F). Each departement can be represented on the factorial map 
by an arrow (the bottom corresponds to its score, the head corresponds to 
its lagged score, Figure 8G). A short arrow reveals a local spatial similarity 
(between one plot and its neighbors), while a long arrow reveals a spatial 
discrepancy. This viewpoint can be interpreted as a multivariate extension 
of the local index of spatial association [Anselin (1995)]. For instance, Aude 
has a very small arrow, indicating that this departement is very similar to 
its neighbors. On the other hand, the arrow for Haute-Loire has a long hori- 
zontal length which reflects its high values for the variables Infants (31017), 
Suicides (163241) and Crime_prop (18043) compared to the average values 
over its neighbors (27032.4, 60097.8 and 10540.8 for these three variables). 
Finistere corresponds to an arrow with a long vertical length which is due 
to its high values compared to its neighbors for Donations (23945 versus 
12563) and Crime_pers (29872 versus 25962). 

4. Conclusions. We have presented different ways of incorporating the 
spatial information in multivariate analysis methods. While PCA is not 
constrained, spatial information can be introduced as a partition (BCA), 
a polynomial of geographic coordinates (PCAIV-POLY), a subset of Moran's 
eigenvector maps (PCAIV-MEM) or a spatial neighborhood graph (MUL- 
TISPATI). This variety of constraints induces a diversity of criteria to be 
maximized by each method: variance (PCA), variance explained by a spatial 
partition (BCA) or by spatial predictors (PCAIV-POLY, PCAIV-MEM), 
product of the variance by the spatial autocorrelation (MULTISPATI). By 
presenting these methods in the duality diagram framework, we have shown 
that these approaches are very general, and can be applied to virtually any 
multivariate analysis. 

These theoretical considerations have practical implications concerning 
the use of these methods in applied studies. PCA is a very general method 
allowing one to identify the main spatial and nonspatial structures. BCA 
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maximally separates the groups corresponding to a spatial partition. It is 
thus adapted when a study focuses on spatial structures induced by a parti- 
tioning defined a priori (e.g., administrative units, etc.). If such an a priori 
partitioning does not exist, one can easily define such a partition albeit in- 
troducing some element of subjectivity in the consideration of the spatial in- 
formation. This problem is solved by PCAIV-POLY, which uses polynomials 
to incorporate the spatial information. Polynomials are easily constructed, 
but their use is only satisfactory when the sampling area is roughly homo- 
geneous and the sampling design is nearly regular [Norcliffe (1969)]. Other 
limitations to their use have been reported in the literature such as the ar- 
bitrary choice of the degree and their ability to account only for smooth 
broad-scale spatial patterns [Dray, Legendre and Peres-Neto (2006)]. 

The use of graphs and spatial weighting matrices allows the construc- 
tion of more efficient and flexible representations of space. Binary spatial 
weighting matrices can be constructed using distance criteria or tools de- 
rived from graph theory [Jaromczyk and Toussaint (1992)]; they may also 
describe spatial discontinuities, boundaries or physical barriers in the land- 
scape. Spatial weights can be associated to the binary links to represent 
the spatial heterogeneity of the landscape using functions of geographic dis- 
tances or least-cost links between sampling locations [Fall et al. (2007)] or 
any other proxies/measures of the potential strength of connection between 
the locations. MEM are obtained by the eigen-decomposition of the spa- 
tial weighting matrix W. For a data set with n observations, this eigen- 
decomposition produces n — 1 MEM. Hence, a subset of these spatial pre- 
dictors must be selected to avoid overfitting in the multivariate regression 
step of PCAIV. Concerning Guerry's data set, we choose the first ten MEM 
arbitrarily. Other objective selection procedures have been proposed in the 
literature. For instance, the criteria can be based on the minimization of the 
autocorrelation in residuals [Tiefelsdorf and Griffith (2007)] or on the max- 
imization of the fit of the model [Blanchet, Legendre and Borcard (2008)]. 
Hence, only a part of the spatial information contained in W (corresponding 
to the subset of MEM retained by the selection procedure) is considered in 
PCAIV. In MULTISPATI, the spatial weighting matrix is used in its origi- 
nal form, so that the whole spatial information contained in it is taken into 
account in the multivariate analysis. 

Even if the methods presented are quite different in their theoretical and 
practical viewpoints, their applications to Guerry's data set yield very sim- 
ilar results. We provided a quantitative measure of this similarity by com- 
puting Procrustes statistics [Peres-Neto and Jackson (2001); Dray, Chessel 
and Thioulouse (2003b)] between the scores of the departements on the first 
two axes for the different analyses (Table 3). All the values of the statistics 
are very high and significant; this confirms the high concordance between 
the outputs of the different methods. This similarity of results is due to the 
very clear structures of the data set and to the high level of autocorrelation 
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Table 3 

Procrustes statistics measuring the concordance between the scores of 

the departements on the first two axes of the different analyses. 
A value of 1 indicates a perfect match between two configurations of 
departement scores. Randomization procedures with 999 permutations 
have been used to test the significance of the concordance. All the 
statistics are significant (p — 0.001 ) 





PCA 


BCA 


PCAIV-POLY 


PCAIV-MEM 


BCA 


0.979 








PCAIV-POLY 


0.979 


0.990 






PCAIV-MEM 


0.989 


0.994 


0.995 




MULTISPATI 


0.987 


0.995 


0.995 


0.999 



of these structures (Figure 4). In this example the main advantage of the 
spatiahy-constrained methods is in the choice of the number of dimensions 
to interpret; while the barplot of eigenvalues of PCA can be difficult to in- 
terpret (see above and Figure lA), it is clear that two spatial dimensions 
must be interpreted in BCA (Figure 5 A) or MULTISPATI (Figure 8A). 

In the case of Guerry's data, the very simple and clear-cut structures seem 
to be recovered by ah the approaches presented here. In more complex data 
sets, spatially constrained methods prove superior to standard approaches 
for detecting spatial multivariate patterns. Dray, Said and Debias (2008) 
presented an example where a standard multivariate method was unable to 
identify any structure and is outperformed by MULTISPATI, which allows 
us to discover interesting spatial patterns. In general, if the objective of 
a study is to detect spatial patterns, it would be preferable to use a spatially- 
constrained method. PCA could also be useful, but it is designed to identify 
the main structures that can or cannot be spatialized. On the other hand, 
spatial multivariate methods are optimized to focus on the spatial aspect 
and would generally produce clearer and smoother results. The outputs and 
interpretation tools of these methods are also more adapted to visualizing 
and quantifying the main multivariate spatial structures. 

From a methodological viewpoint, these approaches provide new ways of 
taking into account the complexity of sampling designs in the framework of 
multivariate methods. Following the famous paper of Legendre (1993), the 
analysis of spatial structures has been a major issue in community ecology 
and originated several methodological developments in the field of spatial 
multivariate analysis. To date, the most integrated and flexible approaches 
have used a spatial weighting matrix which can be seen as a general way to 
consider spatial proximities. Potential methodological perspectives are im- 
portant, as these approaches could easily be extended to any other sampling 
constraints that can be expressed in the form of a matrix of similarities 
between the observations. 
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SUPPLEMENTARY MATERIAL 

Implementation in R (DOI: 10.1214/10-AOAS356SUPP; .zip). This web- 
site hosts an R package (Guerry) containing the Guerry's data set (maps and 
data) . The package contains also a tutorial (vignette) showing how to repro- 
duce the analyses and the graphics presented in this paper using mainly the 
package ade4 [Dray and Dufour (2007)]. The package Guerry is also available 
on GRAN and can be installed using the install .packages ( ' 'Guerry' ') 
command in a R session. 
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